Logic-Based Information Integration and Machine Learning for Gene Regulation Prediction
نویسندگان
چکیده
Introduction and Background One of the central goals in computational and systems biology is to understand the mechanisms of gene transcriptional regulation on a system-wide level. The efforts are often based on high-throughput genomic data of model organisms such as S. cerevisiae. The goal of this work is to learn a model of gene regulation predicting under which conditions genes are upor down-regulated. Our starting point is the model of Middendorf et al. [1], where the presence of transcription factor binding sites (motifs) in the gene’s regulatory region and the expression levels of regulators (e.g., transcription factors or protein kinases) are used to predict gene regulation. It is clear that in this formulation, important information related to gene regulation is missing, for instance due to post-translational modifications. Thus, information integration could be extremely useful to fill in and take into account various missing pieces of information related to gene regulation. Uncovering the multi-relational nature of the problem, we first rephrased it in a logic-oriented framework and defined predicates for various interdependent pieces of information (see below). A logic-oriented representation enables the seamless integration of various data sources: genome-wide cDNA microarray data, motif profile data from regulatory sequences and more. In particular, it is easy to take into account information that might, in any way, be related to gene regulation, for instance, protein-protein interactions and functional categorizations. Given the data in a logical representation, we can apply a variety of algorithms and systems for learning classification and regression models in logic, mostly developed in the field of inductive logic programming (ILP). We chose the Tilde system [2] for learning logical decision trees, since it is known to perform well in terms of runtimes and error rates. Summing up, we believe that decoding the regulation mechanism of genes is an exciting new application of learning in logic, requiring data integration from various sources and facilitating an understanding on a system level. Data, Representation and Results The approach is tested on the S. cerevisiae data by Gasch [3]. As stated above, the goal is to learn a prediction model for the regulatory response of genes under different environmental conditions. In the following, we briefly discuss the predicates/relations in our logical formulation of the problem. In its most basic version, we have three different predicates, gene(GeneId, CondId, Level), hasTFBS(GeneId, BsId), and expression(RegId, CondId, RegLevel). gene(GeneId, CondId, Level) gives the expression level for each gene under a specific experimental condition. As in the study by Middendorf et al., gene expression is discretized and mapped onto three distinct values +1 (up-regulated), 0, and -1 (down-regulated). The learning task is to predict the expression level for a given gene under a certain condition, given some
منابع مشابه
Stock Price Prediction using Machine Learning and Swarm Intelligence
Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...
متن کاملThermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning
Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...
متن کاملThermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning
Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...
متن کاملSports Result Prediction Based on Machine Learning and Computational Intelligence Approaches: A Survey
In the current world, sports produce considerable statistical information about each player, team, games, and seasons. Traditional sports science believed science to be owned by experts, coaches, team managers, and analyzers. However, sports organizations have recently realized the abundant science available in their data and sought to take advantage of that science through the use of data mini...
متن کاملMachine learning algorithms in air quality modeling
Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...
متن کامل